REMARKS/ARGUMENTS 
Claims 44-47 and 49-51 are pending in this application. The rejections to the claims are 
respectfully traversed. 

Claim Reiections-35 U.S.C. §§101/112, First Paragraph 

Claims 44-47 and 49-51 are rejected under 35 U.S.C. §101, allegedly because the claimed 
invention is not supported by either a specific asserted utility or a well-established utility. 

Claims 44-47 and 49-5 1 are further rejected under 35 U.S.C. §112, first paragraph, 
allegedly "since the claimed invention is not supported by either a specific and substantial 
asserted utility or a well established utility, one skilled in the art would not know how to use the 
claimed invention." 

For the reasons outlined below, Applicants respectfully disagree and traverse the 
rejection. With respect to Claims 44-47 and 49-51, Applicants submit that not only has the 
Patent Office not established a prima facie case for lack of utility and enablement, but that the 
PR0343 polypeptides possess a credible, specific and substantial asserted utility and are fully 
enabled. 

Applicants have asserted utility for the instantly claimed PR0343 polypeptide based on 

amplification of the PR0343 gene in the "gene amplification assay" described in the instant 

specification in Example 92. Gene amplification is an essential mechanism for oncogene 

activation. It is well known that gene amplification occurs in most solid tumors, and generally is 

associated with poor prognosis. As described in Example 92 of the present application, the 

inventors isolated genomic DNA from a variety of primary cancers and cancer cell lines that are 

listed in Table 9 (pages 230-234 of the specification), including primary lung and colon cancers 

of the type and stage indicated in Table 8 (page 227). As a negative control, DNA was isolated 

from the cells of ten normal healthy individuals, which was pooled and used as a control 

(page 222, lines 34-36). Gene amplification was monitored using real-time quantitative 

TaqMan™ PCR. The gene amplification results are set forth in Table 9. As explained in the 

passage bridging pages 222 and 223, the results of TaqMan™ PCR are reported in ACt units. 

One unit corresponds to one PCR cycle or approximately a 2-fold amplification, relative to 

control, two units correspond to 4-fold, 3 units to 8-fold, etc. amplification. PR0343 showed 
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ACt values of approximately 1.00-3.62 in seven lung tumors and 1.15-3.49 thirteen colon tumors. 
This corresponds to at least 2.00- 12.3 fold amplification in lung tumors and at least 
2.22- 11.24 fold amplification in colon tumors. Accordingly, the present specification clearly 
discloses strong evidence that the gene encoding the PR0343 polypeptide is significantly 
amplified in a significant number of lung and colon tumors. 

In further support for the "significance" of the amplification, Applicants had submitted, in 
their Response filed March 11, 2003, a Declaration by Dr. Audrey Goddard. Applicants 
particularly draw the Examiner's attention to page 3 of the Goddard Declaration which clearly 
states that: 

It is further my considered scientific opinion that an at least 2-fold increase in 
gene copy number in a tumor tissue sample relative to a normal (i.e., non-tumor) 
sample is significant and useful in that the detected increase in gene copy number 
in the tumor sample relative to the normal sample serves as a basis for using 
relative gene copy number as quantitated by the TaqMan PCR technique as a 
diagnostic marker for the presence or absence of tumor in a tissue sample of 
unknown pathology. Accordingly, a gene identified as being amplified at least 
2-fold by the quantitative TaqMan PCR assay in a tumor sample relative to a 
normal sample is useful as a marker for the diagnosis of cancer, for monitoring 
cancer development and/or for measuring the efficacy of cancer therapy. 
(Emphasis added). 

In addition, the Goddard Declaration clearly establishes that the TaqMan real-time PCR 
method described in Example 92 has gained wide recognition for its versatility, sensitivity and 
accuracy, and is in extensive use for the study of gene amplification. The facts disclosed in the 
Declaration also confirm that based upon the gene amplification results, one of ordinary skill 
would find it credible that PR0343 is a diagnostic marker of lung or colon cancer. 

The Examiner notes that "the present claims are drawn to the polypeptide PR0343, not 
the polynucleotide" and therefore, there is allegedly, no specific, credible or substantial utility for 
the claimed polypeptides. The Examiner quotes Chen et al to support this view. 

Applicants submit that they had presented supportive evidence with their response mailed 

August 11, 2004 to show that the art generally teaches that " it is more likely than not " for 

amplified genes to also result in increased mRNA and protein levels. First, the articles by 

Orntoft et al., Hyman et al. 9 and Pollack et a/., collectively teach that in eenerah gene 
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amplification increases mRNA expression . For instance, Orntoft et aL studied transcript levels- 
of 5600 genes in malignant bladder cancers, many of which were linked to the gain or loss of 
chromosomal material, and found that in general (18 of 23 cases) chromosomal areas with more 
than 2-fold gain of DNA showed a corresponding increase in mRNA transcripts. Orntoft et aL 
showed a clear correlation between mRNA and protein expression levels in the proteins they 
studied and state that, "In general there was a highly significant correlation (p<0.005) between 
mRNA and protein alterations. ... 26 well focused proteins whose genes had a known 
chromosomal location were detected in TCCs 733 and 335, and of these 19 correlated (p<0.005) 
with the mRNA changes detected using the arrays." (See page 42, column 2 to page 34, 
column 2). Accordingly, Orntoft et al clearly support Applicants' position that proteins 
expressed by genes that are amplified in tumors are useful as cancer markers. 

Similarly Hyman et al compared DNA copy numbers and mRNA expression of over 
12,000 genes in breast cancer tumors and cell lines, and found that there was evidence of a 
prominent global influence of copy number changes on gene expression levels. In Pollack et al, 
the authors profiled DNA copy number alteration across 6,691 mapped human genes in 44 
predominantly advanced primary breast tumors and 10 breast cancer cell lines, and found that on 
average, a 2-fold change in DNA copy number was associated with a corresponding 1 .5-fold 
change in mRNA levels. In summary, the evidence supports the Appellants 1 position that gene 
amplification is more likely than not predictive of increased mRNA and polypeptide levels. 

Second, the Declaration of Dr. Paul Polakis, principal investigator of the Tumor Antigen 
Project of Genentech, Inc., the assignee of the present application, shows that, in general, there is 
a correlation between mRNA levels and polypeptide levels . 

Applicants further submit that, contrary to the Examiner's assertion, the cited Chen et al 
reference does not conclusively establish a prima facie case for lack of utility for the PR0343 
polypeptide. For instance, Applicants note that the proteins selected for their study in Chen et aL 
were identified by staining of 2D gels. As is well known, there are problems with selecting 
proteins detectable by 2D gels: "It is apparent that without prior enrichment only a relatively 
small and highly selected population of long-lived, highly expressed proteins is observed. There 
are many more proteins in a given cell which are not visualized by such methods. Frequently it is 
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the low abundance proteins that execute key regulatory functions 11 (page 1870, col. 1). Thus, 
Chen et aL, by selecting proteins visualized by 2D gels, are likely to have excluded in their 
analysis many key regulatory proteins which could be candidate cancer markers. 

Secondly, the manner in which the Chen data was averaged and analyzed is a vastly 
different manner from that of the instant specification. For example, Chen et aL studied 
expression levels across a set of samples which included a large number of tumor samples (76) 
and a much smaller group of normal samples (9). The authors determined the global relationship 
between mRNA and corresponding protein expression using the average expression values for all 
85 lung tissue samples . The authors chose an arbitrary threshold of 0.1 15 for the correlation to 
be considered significant. This resulted in negative normalized protein values in some cases and 
the authors concluded that it is not possible to predict overall protein expression based on 
average mRNA abundance. Once again, Applicants remind the Examiner that the utility 
standard does not require accurate prediction of protein values; only that in a majority of the 
proteins studied, it is more likely than not that protein levels increased when mRNA levels 
increased. A review of the correlation coefficient data presented in the Chen et aL paper 
indicates that, in fact, Chen teaches that 'it is more likely than not* that increased mRNA 
expression correlates well with increased protein expression. For instance, a review of Table 1, 
which lists 66 genes [the paper incorrectly states there are 69 genes listed] for which only one 
protein isoform is expressed, shows that 40 genes out of 66 had a positive correlation between 
mRNA expression and protein expression . This clearly meets the test of "more likely than not". 
Similarly, in Table II , 30 genes with multiple isoforms [again the paper incorrectly states there 
are 29] were presented. In this case, for 22 genes out of 30, at least one isoform showed a 
positive correlation between mRNA expression and protein expression. Furthermore, 12 genes 
out of 29 showed a strong positive correlation [as determined by the authors] for at least one 
isoform. No genes showed a significant negative correlation. It is not surprising that not all 
isoforms are positively correlated with mRNA expression. Thus, Table II also provides that it is 
more likely than not that protein levels will correlate with mRNA expression levels. 

The same authors in Chen et aL, published a later paper, Beer et aL, Nature Medicine 8(8) 
816-824 (2002) (copy enclosed) which described gene expression of genes in adenocarcinomas 
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and compared that to protein expression. In this paper they report that "these results suggest that 
the oligonucleotide microarrays provided reliable measures of gene expression." (pg 317). The 
authors also state "these studies indicate that many of the genes identified using gene expression 
profiles are likely relevant to lung adenocarcinoma." Clearly the authors of the Chen paper agree 
that microarrays provide a reliable measure of the expression levels of the gene and can be used 
to identify genes whose overexpression is associated with tumors. 

As was discussed in the Utility standard submitted in previous responses, the law does 
not require the existence of a "strong" or "linear" correlation between mRNA and protein levels. 
Nor does the law require that protein levels be "accurately" predicted. Accordingly, the data by 
Chen et al confirm that there is a general trend between protein expression and transcript levels, 
which meets the "more likely than not standard" and shows that a positive correlation exists 
between mRNA and protein. Therefore, Applicants submit that the Examiner's Utility rejection 
is based on a misrepresentation of the scientific data presented in Chen et al and by applying an 
improper, heightened legal standard in this case. In fact, contrary to what the Examiner contends, 
the art indicates that, if a gene is amplified in cancer, it is more likely than not that the mRNA 
and the encoded protein will also be expressed at an elevated level. As noted even in Chen et al 
most genes showed a correlation between increased mRNA and translated protein. 

The Examiner also notes that "the claimed sequences merely revealed similarity to 
proteases in genenral" (page 3, last paragraph of Office action). Applicants respectfully assert 
that utility for the instant PR0343 is based on the results in the gene amplification assay, not on 
structure prediction , and hence such a rejection is moot. 

Taken together, although there are some examples in the scientific art that do not fit 
within the central dogma of molecular biology that there is a correlation between DNA, mRNA, 
and polypeptide levels, these instances are exceptions rather than the rule. In the majority of 
amplified genes , as exemplified by Orntoft et al, Hyman et al, Pollack et al, the Polakis 
Declaration and the widespread use of array chips, the teachings in the art overwhelmingly show 
that gene amplification influences gene expression at the mRNA and protein levels . Therefore, 
one of skill in the art would reasonably expect in this instance, based on the amplification data 
for the PR0343 gene, that the PR0343 polypeptide is concomitantly overexpressed. 
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Thus, Applicants have demonstrated utility for the PR0343 polypeptide based on the 
gene amplification assay and thus, Applicants request that the Examiner reconsider the utility for 
the present application based on the present arguments. Furthermore, since the specification has 
provided detailed protocols for the gene amplification assay, for example, in Example 92, one of 
ordinary skill in the art could identify that the claimed polypeptides could be made and used in 
the diagnosis of lung or colon tumors, without any undue experimentation. 

Hence Applicants respectfully request reconsideration and reversal of the utility/ 
enablement rejection of the pending claims under 35 U.S.C. §§101/1 12, first paragraph. 

The present application is believed to be in prima facie condition for allowance, and an 
early action to that effect is respectfully solicited. 

Please charge any additional fees, including any fees for additional extension of time, or 
credit overpayment to Deposit Account No. 08-1641 (referencing Attorney's Docket 
No. 39780-1618 P2C48) . Please direct any calls in connection with this application to the 
undersigned at the number provided below. 

Respectfully submitted, 

Date: November 9, 2005 

HELLER EHRMAN, LLP 

275 Middlefield Road 
Menlo Park, California 94025 
Telephone: (650) 324-7000 
Facsimile: (650) 324-0638 
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Gene-expression profiles predict survival of patients with lung 

adenocarcinoma 
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Histopathotogy Is Insufficient to predict disease progression and clinical outcome in lung adeno- 
carcinoma. Here we show that gene-expression profiles based on microarray analysis can be 
used to predict patient survival in early-stage lung adenocarcinomas. Genes most related to sur- 
vival were identified with univariate Cox analysis. Using either two equivalent but independent 
training and testing sets, or leave-one-out' cross-validation analysis with all tumors, a risk index 
based on the top 50 genes Identified low-risk and high-risk stage I lung adenocarcinomas, which 
differed significantly with respect to survival. This risk index was then validated using a" inde- 
pendent sample of lung adenocarcinomas that predicted high- and low-risk groups. This index 
included genes not previously assodated with survival. The Identification of a set of genes that 
predict survival in early-stage lung adenocarcinoma allows delineation of a high-risk group that 
may benefit from adjuvant therapy. 



Lung cancer remains the leading cause of cancer death in indus- 
trialized countries. Most patients with non-small cell lung can- 
cer (NSCLC) present with advanced disease, and despite recent 
advances in multi-modality therapy, the overall 10-year survival 
rate remains a dismal 8- 10%*. However, a significant minority of 
patients (-25-30%) with NSCLC have stage I disease and receive 
surgical intervention alone. Although 35-50% of patients with 
stage I disease will relapse within 5 years 2 "*, it is not currently 
possible to identify specific high-risk patients. 

Adenocarcinoma is currently the predominant histological 
subtype of NSCLC (refs. 1,5,6). Although morphological assess- 
ment of lung carcinomas can roughly stratify patients, there is a 
need to identify patients at high risk for recurrent or metastatic 
disease. Preoperative variables that affect survival of patients 
with NSCLC have been identified 7 ' 10 . Tumor size, vascular inva- 
sion, poor differentiation, high tumor-proliferative index and 
several genetic alterations, including K-ras (refs. 11,12) and p53 
(refs. 10,13) mutations, have prognostic significance. Multiple 
independently assessed genes or gene products have also been 
investigated to better predict patient prognosis in lung can- 
cer i4-i8 Technologies that simultaneously analyze the expression 
of thousands of genes 1 * can be used to correlate gene-expression 
patterns with numerous clinical parameters— Including patient 
outcome — to better predict tumor behavior in individual pa- 
tients 20 . Analyses of lung cancers using^rray technologies have 
identified subgroups of tumors that differ according to tumor 
type and histological subclasses and, to a lesser extent, survival 
among adenocarcinoma patients 21,22 . Here we correlated gene- 
expression profiles with clinical outcome in a cohort of patients 
with lung adenocarcinoma and identified specific genes that 



predict survival among patients with stage I disease. For further 
validation, we also show that the risk index predicted survival in 
an independent cohort of stage I lung adenocarcinomas. 

Hierarchical profile clustering yields three tumor subsets 

Using oligonucleotide arrays, we generated gene-expression pro- 
files for 86 primary lung adenocarcinomas, including 67 stage 1 
and 19 stage III tumors, as well as 10 non-neoplastic lung sam- 
ples. Selected sample replicates showed high correlation among 
coefficients and reliable reproducibility. We determined tran- 
script abundance using a custom algorithm and the data set was 
trimmed of genes expressed at extremely low levels, that is, 
genes were excluded if the measure of their 75th percentile value 
was less than 100. Although potentially resulting in the loss of 
some information, trimming in this manner decreased the possi- 
bility that the clustering algorithm would be strongly influenced 
by genes with little or no expression in these samples. 
Hierarchical clustering with the resulting 4,966 genes yielded 3 
clusters of tumors (Fig. 1). All 10 non-neoplastic samples clus- 
tered tightly together within Cluster 1 not shown). We ex- 
amined the relationships between chaster and patient and tumor 
characteristics {Fig. 1 and Supplementary Figure A online). There 
were associations between cluster arid stage (P = O.O30) and be- 
tween cluster and differentiation (P = 0.01). Cluster 1 contained 
the greatest percentage (42.8%) of ^v* 11 differentiated tumors, 
followed by Cluster 2 (27%) and Cluster 3 (4.7%). Cluster 3 con- 
tained the highest percentage of t> otr » poorly differentiated 
(47.6%) and stage 111 tumors (42.89*0, yet contained 3 (14.3%) 
moderately differentiated and 1 (5<K>) well differentiated stage I 
tumor. Notably, 11 stage I tumors were predentin Cluster 3, sug- 
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gesting a common gene-expression profile for 
this subset of stage I and stage III tumors. 

For patients with stage I and stage III tumors, 
the aveiage ages were 68.1 and 64.5 years and 
the percentage of smokers was 88.9% and 
89.5%, respectively. Marginally significant as- 
sociations between cluster and smoking his- 
tory were observed (P = 0.06). A significant 
relationship between histopathological classifi- 
cation and cluster was only discemable for 
bronchioloalveolar adenocarcinomas (BAs), 
which were only present in Clusters I and 2 
{P = 0.0055) and comprised 35.7% and 12.3% 
of rumors for Clusters 1 and 2, respectively. 

We examined the heterogeneity in gene-ex- 
pression profiles based on the trimmed data set among normal 
lung samples and stage I and stage HI adenocarcinomas by calcu- 
lating correlation coefficients between all pairs of samples. In 
contrast to normal lung samples that displayed highly similar 
gene-expression profiles (median correlation, 0.9), both stage I 
and III lung tumors demonstrated much greater heterogeneity in 
their expression profiles with lower correlation coefficients (me- 
dian values, 0.82 and 0.79, respectively). 

Northern-blot and immunohistochemistry analyses 

Of the 4,966 genes examined, 967 differed significantly between 
stage I and III adenocarcinomas, a number in excess of that ex- 
pected by chance alone (248 at alpha level (a) = 0.05). Three 
genes were arbitrarily selected to verify the microarray expression 
data. The mRNA from 20 of the normal lung and tumor samples 
was examined by- northern-blot hybridization with probes for in- 
sulin-like growth factor-binding protein 3 (IGFBP5), cystatin C 
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Fig. 1 Unsupervised classification analysis of lung adenocarcinomas. 3 classes of tumors identi- 
fied by agglomerative hierarchical clustering of gene-expression profiles using the 4,966 expressed 
genes. Patient and histopathological information for each lung adenocardnoma case by cluster 
designation and methods for K-ras 12/Uth-codon mutational status andnudear p53 protein ac- 
cumulation are provided (Supplementary Figure A online). TN classification denotes information 
regarding patient tumor size and nodal involvement. Associations between duster membership 
and patient or histopathological variables are indicated at significance level (PS 0.05). 



and lactate dehydrogenase A (LDH-A) (Fig. 2a). Two gene probes 
not represented on the microarrays were used as controls, includ- 
ing histone H4, a potential index of overall cell proliferation, and 
28S ribosomal RNA, a control for sample loading and transfer. 
The relative amounts of IGFBP3, cystatin C and LDH-A mRNA 
strongly correlated with microarray-based measurements (Fig. 
2b), In both assays, IGFBP3 and LDH-A mRNA levels increased 
from stage I to stage III adenocarcinomas and were higher than 
those in normal lung. Cystatin C mRNA levels were more variable 
but relatively greater in normal lung than tumors. These results 
suggest that the oligonucleotide microarrays provided reliable 
measures of gene expression. The tumors showed slightly greater 
histone H4 expression than the normal lung, likely reflecting in- 
creased proliferation of tumor cells. 

Immunohistochemistry was performed for IGFBP3, cystatin C 
and HSP-70 to determine whether mRNA overexpression was re- 
flected by an increase of their corresponding proteins in tumors. 
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Fig. 2 Validation analyses of gene-expres- 
sion profiling. a, Northern-blot analysis of |-B 1 1 r"0.86 
selected candidate genes for verification of o- ools i ' j 1I5 2 * 0 2's xc & 
data obtained from oligonucleotide arrays. Northern 
The same sample RNA for the 4 un involved 
lung, 8 stage I and 8 stage III tumors was 

used for the northern-blot and oligonucleotide array analyses. 
b, Correlation analysis of quantitative data obtained from oligonucleotide 
arrays and northern bfots measured by integrated phosphor! mager-based 
signals for the IGF&P3 and LDH-A genes. The ratio of ICFBP3, cystatin C 
and LDH-A mRNA to 28 S rRNA was determined. The relative values for 
each gene from each sample are shown, n, non-neoplastic normal lung; 
1, stage I tumors; 3, stage HI tumors. c t Immunohistochemical analysis of 
IGFBP-3, HSP-70 and cystatin C in lung and lung adenocarcinomas. 
Cytoplasmic IGF BP- 3 immunoreactivity in a neoplastic gland (tumor L22) 
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with prominent apical staining (blue reactant staining, arrow, upper left). 
Diffuse cytoplasmic HSP-70 Immunoreactivity (tumor U27), yet stromal el- 
ements show no reactivity (upper right). Normal lung parenchyma (lower 
left) shows cytoplasmic cystatin C Immunoreactivity in alveolar pneumo- 
cytes (arrow) and Intra-alveolar macrophages but tumor (L90) shows dif- 
fuse cytoplasmic cystatin C .Immunoreactivity with prominent apical 
staining (lower right). Magnification, x2O0 
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Immunoreactivity for both IGFBP-3 and HSP-70 (Fig. 2c) was de- 
tected in the cytoplasm of the adenocarcinomas, with little de- 
tectable reactivity in the stromal or inflammatory cells. Cystatin 
C was detected in alveolar pneumocytes and Intra-alveolar 
macrophages in non-neoplastic lung parenchyma and also con- 
sistently in the cytoplasm of neoplastic cells. 

Gene-expression profiles predict survival 
As expected, Kaplan-Meier survival curves (Fig. 3a) and log-rank 
tests indicated poorer survival among stage III compared with 
stage I adenocarcinomas (P = <0.0001). Two statistical ap- 
proaches were used to determine whether gene-expression pro* 
files could predict survival using the data set of 4,966 genes. In 
one approach, equal numbers of randomly assigned stage I and 
stage 111 tumors constituted training (n = 43) and testing {n = 43) 
sets. In the training set, the top 10, 20, 5P or 75 genes were used 
to create risk indices that were evaluated for their association 
with survival using the 50th, 60th or 70th percentile cutoff 
points to categorize patients into high or low groups. The results 
were similar across cutoff points but the 50-gene risk index had 
the best overall association with survival in the training set. 



Fig. 3 Cene-expression profiles and patient survival a, Relationship be- 
tween tumor stage and patient survival (stage 1 and stage 3 differ signifi- 
cantly^ P< 0.0001). 6, Relationship between the survival in the 43 test 
samples and their risk assignments based on the 50-gene risk index esti- 
mated in the 43 training samples. The high- and low-risk groups differ sig- 
nificantly (P» 0.024). c, Relationship between patient survival and the risk 
assignments In test samples On b) conditional for tumor stage* The high- 
and low-risk stage I groups differ significantly O.028), whereas stage Hi 
low- and high-risk groups did not (/> « 0.634)- d. Relationship between sur- 
vival in the test cases and their risk assignments based on the 86 'leave-one- 
out* cross-validation of the 50-gene risk index. The high- and low-risk 
groups differ significantly (P = 0.0006). e, Relationship between test case's 
risk assignment and survival (in d) conditional on tumor stage. The highl- 
and low-risk stage I lung adenocarcinoma groups differ significantly from 
each other (P 0.003), whereas low- and high-risk stage Hi tumors do not. 
f t Relationship between tumor class identified by hierarchical clustering and 
patient survival. Survival for patients in Cluster 3 differed relative to the tu- 
mors in Ouster 2 (P » 0.037) and approached significance for Cluster 1 and 
2 combined (P« 0.06). g, Analysis of the Michigan-based risk index using 
top cross-validated survival genes identify a tow- and high-risk group in an 
independent cohort of 84 Massachusetts-based lung adenocarcinomas that 
are significantly different (P = 0.003). h, Among the 62 stage I lung adeno- 
carcinomas in the Massachusetts sample, the high- and low-risk groups dif- 
fered significantiy (P= 0.006). 

After conservatively choosing the 60th percentile cutoff point 
from the training set, we then applied this risk index and cutoff 
point to the testing set. The risk index of the top SO genes cor- 
rectly identified low- and high-risk individuals within the inde- 
pendent testing set (P = 0.024) (Fig- 3i> and Supplementary 
Methods online). Notably, 11 stage I tumors rae included in 
the high-risk subgroup. When this risk assignment was then 
conditionally examined for stage progression (Tig. 3c), low- and 
high-risk groups among stage I tumors were found to differ (P = 
0.028) in their survival. 

identification of a robust set of survival genes 

Although predictive of patient survival, a single training-testing 
set may not provide the most robust set of genes due to random 
sampling issues. Therefore, a 'leave-one-out' cross-validation ap- 
proach was used to identify genes associated with survival from 
all 86-tumor samples. We first developed a 50-gene risk index in 
each training set, and then applied the risk index to the test case 
held out from the full set of tumors and assigned the held out 
tumor to the high- or low-risk groups (Fig. 3*. The high and 
low-risk subgroups determined in the test cases differed signifi- 
cantly in their overall survival (P - O.O006). Among the larger 
group of stage I lung adenocarcinomas, the iow-risk (n - 46) and 
high-risk (n = 21) groups had markedly different survival {P = 
0.003) (Fig. 3e). Table 1 lists selected examples of the cumulative 
top 100 genes derived from this cross-validation procedure 
(complete list in Supplementary Table A online). 

It was also noted that many of the stage I patients in the high- 
risk subgroup (Fig. 3e) were present in Ouster 3 (Fig. 1). 
Kaplan-Meier analysis (Fig. 3/) demonstrated a significantly 
worse survival {P = 0.037) for patients in Cluster 3 relative to pa- 
tients in Cluster 2 and approaching significance for Cluster 1 
and 2 combined (P s 0.06). This furtHer indicates the important 
relationship between gene-expression profiles and patient sur- 
vival, independent of disease stage. 

Consistent with previous analyses of lung adenocarcinomas", 
40% of stage I and 57.8% of stage III tumors had 12th or 13th 
codon K-ras gene mutations. Those patients with tumors con- 
taining K-ras mutations showed a trend of poorer survival, but 
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Table 1 Selected examples of the top 1 00 genes from cross-validation 
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Solded genes were also significant for survival In 43 turner training set (Fig. 36). 



Table 1 Selected examples of the cumulative top 1 00 genes Identified using 
training-testing, cross-validation of all 86 lung tumor samples. The percent 
change, as well as the direction, for the average values of the 1 0 norvneoplastk: 
ting to ail tumors, and for the 67 stage I to the 19 stage 111 tumors are shown. A 
positive coefficient 0 value is indicative of a relationship of gene expression to a 



poorer patient outcome. The genes are listed in potential functional categories. 
Cenes that were also present in the top 50 survival genes using the 43-tumor 
training set (Fig. 3b) are indicated in bold type. Complete fisting of the gene 
probe sets and annotated gene and urtfgene identifiers can be found in the 
Supplementary Methods, 
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Fig. 4 Gene expression patterns of top survival genes 0, Gene-expression patterns de- 
termined using agglomerative hierarchical clustering of the 86 lung adenocarcinomas 
against the 100 survival-related genes (Table 1) identified by the training-testing, cross- 
validation analysis. Substantially elevated (red) or decreased (green) expression of the 
genes is observed in individual tumors. Some tumors (black arrow and expanded area) 
show extremely elevated expression of specrfic^enes. b, An outlier gene-expression pat- 
tern (>5 times the interquartile range among all samples) is observed for the erb%2 and 
RegIA genes (top left and right, respectively). The SI 00? and ak genes (bottom left and 
right, respectively) show a graded pattern of expression related to patient survival. O, 
alive; #, dead (also in c). c, The number of outliers per person identified in the top 1 00 
genes plotted by survival distribution. 
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Fig. 4 Gene expression patterns of top survival genes a, Gene-expression patterns de- 
termined using agglomerative hierarchical cJustering of the 86 lung adenocarcinomas 
against the 100 survival-related genes (Table 1) identified by the training-testing, cross- 
validation analysis. Substantially elevated (red) or decreased (green) expression of the 
genes is observed in individual tumors. Some tumors (black arrow and expanded area) 
show extremely elevated expression of specific genes, d, An outiier gene-expression pat- 
tern (>5 times the interquartile range among all samples) is observed for the erb%2 and 
Reg"\ A genes (top (eft and right, respectively). The SI 00? and crk genes (bottom left and 
right, respectively) show a graded pattern of expression related to patient survival. O, 
alive; 9, dead (also in c). c, The number oi outliers per person identified in the top 1 00 
genes plotted by survival distribution. 
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this difference did not reach statistical significance among all 
patients (P = 0.25), between patients within tumor clusters (P * 
0.4t) or when analyzed separately among stage I (P = 0.22) and 
stage III (P « 0.53) patients. Nuclear accumulation of p53 was de- 
tected in 17.9% stage I and in 22.2% stage III tumors. No signifi- 
cant relationship was observed for p53 staining and patient 
survival, cluster or tumor stage. 

Confirmation using an independent set of adenocarcinomas 

The robustness of our 50-gene risk index in predicting survival in 
lung adenocarcinomas was tested using oligonucleotide gene-ex- 
pression data obtained from a completely independent 
(Massachusetts-based) sample of 84 lung adenocarcinomas (62 
stage I, 14 stage II and 8 stage III; ref. 21, and dataset A at 
www.genome.wi.mit.edu/MPR/lung). To ensure equivalent 
power for testing and comparability of samples, the criteria for 
including tumors in the analysis were 40% or greater tumor cellu- 
larity, no mixed histology (that is, adenosquamous) and patient 
survival information. To obtain comparative gene-expression 
measures between the two data sets, gene sequences present on 
the U95A and HuGeneFL array were examined, and expression 
data for our top 50 cross-validation genes for all 84 Massachusetts 
samples were obtained and processed 24 (see also Supplementary 
Methods online ). When we examined the risk assignment of 
these 84 samples, employing the identical cutoff point used for 
the 86 Michigan-based lung samples, we observed low- and high- 
risk groups (Fig. 3# P - 0.003). Notably, among the 62 stage I tu- 
mors, high- and low-risk groups were observed that differed 
significantly (P = 0.006) in their survival (Fig. 3ft). 

Survival genes had graded and outlier expression patterns 

A statistical and graphical analysis of the 100 survival-related 



genes (Table 1) clustered against all 86 tumors revealed individ- 
ual tumors with substantially elevated expression in both a lim- 
ited and larger number of genes (Fig. 4a). Amongthese genes, we 
observed two distinct patterns of expression related to patient 
survival. One pattern, designated 'outlier', included genes show- 
ing substantially elevated expression (greater than five times the 
interquartile range among all samples), whereas the other pat- 
tern, designated 'graded', was characterized by continuously dis- 
tributed expression with patient survival (Fig. 4b). The erbBZ and 
RegXX genes are examples of outlier expression patterns and 
S100P and crk genes of graded patterns. The number of outliers 
per person in the top 100 genes was identified and plotted ac- 
cording to survival times and events (Fig. 4c). Both stage I and 
stage HI lung adenocarcinomas showed outlier gene patterns 
and 10 tumors contained 3 or more outlier genes. 

Because gene amplification may result in increased gene ex- 
pression, the nine genes with outlier expression patterns (erbBZ, 
5ICJA6, Wnf 1, MGBl t ReglA, AKAP12, PACE, CYP24, KYNU) 
and one gene with a graded expression pattern {KKT1S) were ex- 
amined using quantitative genomic PCR to evaluate genomic 
copy number (Fig. 5a), Gene amplification of eibhZ (17ql2) was 
detected in tumor L94, which had the highest a&BZ mRNA ex- 
pression (Fig. 4a). Gene amplification was not detected for any 
of the other seven tested genes in tumor L94, as well as in other 
tumors. The two genes most frequently demonstrating the out- 
lier pattern in these lung adenocarcinomas were KYNU and 
CYP2A, and were present in 10 and 9 tumors, respectively. 
CYP2* has been described as a gene amplified and overexpressed 
in breast cancer 25 , and these results indicate elevated expression 
in lung adenocarcinoma. 

To determine whether the graded or outlier gene-expression 
patterns also occur at the protein-expression level, lO of the 100 



a 



Cth L37 L79 L95 L20 L94 
N T N T N T N T N TNT 



b 



GAPDH 






REG1A 



Fig. 5 Gene amplification arid protein expression of survival-related genes. 
a. Analysis of potential gene amplification for 9 genes showing outlier expres- 
sion patterns in the lung tumors (erf>B2, SLC1A6, Wnt 1, MCB\, Reg\k, 
AHAPX2, PACE, CYP24 and KYNU) and examine^ using quantitative genomic 
PCR. A gene showing graded expression pattern (K«ri8), and one gene 
(MC54) with a similar chromosome location as PACE, were used as controls. 
Only erbS2 and /leal A are shown. An esophageal adenocarcinoma with 
known high-level genomic amplification of erb%2 was used as a positive con- 
trol and normal esophagus ONA was used as a negative control (CtJ). PCR 
fragments sizes were 343 bp for GAPDH, 166 bp for evf>B2 and 126 bp for 




Regl A. DNA is from normal lung (N) and tumorCT) from each patient (for ex- 
ample L37). b, Immunohistochemical analysis of survival related genes with 
lung adenocarcinoma microanrays using trie tumors from this study. The 
transmembrane erbB2 protein (top left) expression is substantially increased 
in tumor 194 containing the amplified erbBJZ gene (Fig. 4a and Expression 
ofVECF (top right) and SI OOP (bottom left) was located within the neoplas- 
tic cells and the pattern of immunoreactivity was consistent with the graded 
expression pattern demonstrated by their mRNA profiles. Expression of the 
oncogene crk (bottom right) was abundantly expressed in neoplastic lung 
cells. Magnification, x400 (erb82); x200 (VEG F, St OOP and crk). 
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top survival genes (Table 1) for which specific antibodies were 
available were chosen for immunohistochemical analysis using 
lung-tumor arrays from this study (Fig. 5b). Expression of mem- 
brane erbB2 protein was substantially increased in the er£>B2-am- 
plified tumor L94 and very low levels of expression were present 
in other tumors, consistent with mRNA-expression measure- 
ments (Fig. 4a and b). CDC6 protein expression was also sub- 
stantially higher in tumor L94, consistent with mRNA levels 
(data not shown). Expression of vascular endothelial growth fac- 
tor (VEGF) and S100P (Fig. Sb), as well as cytokeratin 18 (KRT18), 
cytokeratin 7 (KRT7) and fas-associated death domain (FADD) 
protein (data not shown), was located within the lung tumor 
cells and consistent with the graded expression pattern of the 
mRNA profiles. The oncogene crk showed both graded mRNA as 
well as a graded protein-expression pattern with survival, and 
was abundantly expressed in the tumor cells (Fig. Sb). These re- 
sults indicate that many survival-associated genes are expressed 
at the protein level and demonstrate similar mRNA and protein- 
expression patterns. 

Discussion 

We used several approaches for the analysis of gene-expression 
data related to clinicopathological variables and patient sur- 
vival One approach, hierarchical clustering, was used to exam- 
ine similarities among lung adenocarcinomas in their patterns 
of gene expression. Previous studies of lung tumors 21 * 22 have also 
used this method to describe subclasses of lung tumors. Here, 
we found three clusters that showed significant differences with 
respect to tumor stage and tumor differentiation. This suggests, 
as expected, that tumors with similar histological features of 
differentiation demonstrate similarities in gene expression. 
This feature also partly underlies the observed statistical associ- 
ation of tumor stage and cluster, as many of the higher-stage tu- 
mors, often poorly differentiated and previously associated 
with a reduced survival 910 , were located in Cluster 3. Although 
this cluster contained the highest percentage of stage III tu- 
mors, it also contained a nearly equal mixture of stage I and 
stage III tumors and not all tumors were poorly differentiated. 
This indicates that a subset of stage I lung adenocarcinomas 
share gene-expression profiles with higher-stage tumors. 
Notably, 10 of the 11 stage I tumors found in Cluster 3 were the 
high-risk stage I tumors Identified using the risk Index in the 
'leave-one-out' cross-validation. 

In contrast to previous analyses of lung adenocarcinomas 21 - 22 , 
we validated the expression data from the arrays. The strong cor- 
relation of northern-blot analysis and oligonucleotide-array data 
for gene expression in the same samples (Fig. 2b) indicates that 
these studies provide robust gene-expression estimates. 
Immunohistochernistry using the same tumor samples in tissue 
arrays demonstrates protein expression within the lung tumor 
cells. Together, these studies indicate that many of the genes 
identified using gene-expression profiles are likely relevant to 
lung adenocarcinoma. For example, IGFBP3 gene expression is 
increased in lung adenocarcinomas (Fig. 2c). TGFBP3 protein 
modulates the autocrine or paracrine effects of insulin -like 
growth factors, elevated IGFBP3 expression is observed in colon 
cancer 26 , and increased serum IGFBP3 Is associated with progres- 
sion in breast cancer 27 . Heat-shock protein 70 (HSP-70) is in- 
creased in lung adenocarcinomas of smokers 28 and is associated 
with increased metastatic potential in breast cancer 29 . Increased 
serum lactate dehydrogenase is correlated with tumor stage and 
tumor burden 30 , and cystatin C, a cysteine protease inhibitor ex- 



pressed in human lung cancers 31 , is prognostic in some cancers 32 . 
The decreased expression of this protease inhibitor may affect 
the invasive properties of the tumor cell. 

The cross-validation analytical strategy we used is particularly 
informative for these types of gene-expression analyses for dis- 
ease outcome 33 ' 34 , and identification of cross-validated genes with 
a larger tumor cohort may help refine this risk index for use in a 
clinical setting. The gene-expression data also provide opportuni- 
ties to observe overarching patterns that advance our under- 
standing of associations between genes and disease. For example, 
the top 100 survival genes include those involved in signaling, 
cell cycle and growth, transcription, translation and metabolism. 
Expression of many of these genes is likely a function of increased 
proliferation and metabolism in the more aggressive tumors. 
Some genes, such as erb%2 and ReglA (Fig. 4a and b), were highly 
overexpressed in a few patients having poor survival. In one 
tumor, the erbhZ gene was amplified (Fig. 5a), demonstrating that 
genomic changes may underlie the overexpression of a subset of 
these outlier genes. Immunohistochernistry confirmed protein 
overexpression in this patient's tumor (Fig. 5b), Notably, seven of 
the eight outlier genes were not amplified, indicating that other 
mechanisms underlie the increased itlRNA expression of these 
survival-related genes. 

Most genes showed a graded relationship between expression 
and patient survival. Genes such as that encoding VEGF, known 
to be strongly associated with survival in lung cancer 35 * 36 were 
identified as related to patient survival in our study. VEGF 
demonstrated a graded expression pattern, as did the SI OOP and 
crk oncogene (Fig. Sb). S1C0P is a calcium-regulated protein not 
previously reported in lung cancer. The crk gene, the cellular ho- 
molog of the v-crk oncogene, is a member of a family of adaptor 
proteins involved in signal transduction and interacts directly 
with c-jun N-terminal kinase 1 (fNKl) 37 . Although crk has not 
been shown to have a role lung cancer, its role in the MAP-ki- 
nase pathway, which leads to activation of matrix metallopro* 
teinase secretion and cell invasion 38 , indicates potential 
involvement in the the tumor cell invasion or metastasis of 
some lung adenocarcinomas. Among the many genes identified 
in this study, like crk, that may be causally involved in lung can- 
cer progression (Table 1), some were related to survival in many 
patients, and others in only smaller subsets of patients. This re- 
sult is consistent with the complex molecular architecture of tu- 
mors in general, the heterogeneity of lung adenocarcinomas in 
particular and the multiple mechanisms underlying tumor-cell 
survival, invasion and metastasis**. 

Our results demonstrate that a gene-expression risk profile- 
based on the genes most associated with patient survival— can 
distinguish stage I lung adenocarcinomas and differentiate prog- 
noses. The particular genes that define the dusters, or are associ- 
ated with survival, likely reflect the characteristics of the 
particular tumors included in the analysis. Current therapy for 
patients with stage I disease usually consists of surgical resection 
without adjuvant treatment^. Clearly, the Identification of a 
high-risk group among patients with stage I disease would lead 
to consideration of additional therapeutic intervention for this 
group, possibly leading to improved survival of these patients. 

Methods 

Patient population. Sequential patients seen at the University of Michigan 
Hospital between May 1 994 and July 200O for stage I or stage lit lung ade- 
nocarcrnoma were evaluated for this study. Consent was received and the 
profect was approved by the local Institutional Review Board. Primary tu- 
mors and adjacent non-neoplastic lung tissue were obtained at the time of 
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surgery. Peripheral portions of resected lung carcinomas were sectioned, 
evaluated by a study pathologist and compared with routine H&E sections 
of the same tumors, and utilized for mRNA isolation. Regions chosen for 
analysis contained a tumor cellularity greater than 70%, no mixed histol- 
ogy, potential metastatic origin, extensive lymphocytic Infiltration or fibro- 
sis. Tumors were histopathologicalfy divided into two categories based on 
their growth pattern: bronchial-derived, if they exhibited invasive features 
with architectural destruction, and bronchtoioalveolar, if they exhibited 
preservation of the lung architecture. All stage I patients received only sur- 
gical resection with intra-thoracic nodal sampling and no other treatments. 
Stage III patients received surgical resection plus chemotherapy and radio- 
therapy. 

Gene-expression profiling and K-ras mutation analysis. RNA isolation, 
cRNA synthesis and gene-expression profiling were performed as de- 
scribed 34 . Details of gene annotation and K-ras mutation analysis are pro- 
vided in supplementary information. 

Northern-blot analysis. Total cellular RNA (1 0 jig) was separated in 1 .2% 
agarose-formaldehyde gels and vacuum-transferred to Gene Screen Pius 
(NEN Life Science Products, Boston, Massachusetts). Hybridization condi- 
tions and probe labeling were as described 40 , individual sequence-validated 
cDNA image clones for human ICfBn (done 1407750), LDH-A (done 
2420241), cystatin C (CTS3; clone 949938) were from Research Genetics 
(Huntsvilie, Alabama). The human histone H4 cDNA and the 28S r i bosom aJ 
RNA 26-mer oligonucleotide probe were prepared and labeled as de- 
scribed* 

Gene-amplification analysis. 1 1 genes were selected for the analysis of ge- 
nomic alterations. Primers were designed using PrimerSelett 4.05 Windows 
32 software (ONASTAR, Madison, Wisconsin), avoiding pseudogenes or po- 
tential homologous regions. Forward and reverse primers for the genes are 
provided (Supplementary Methods online). Quantitative genomic-PCR was 
then applied and analyzed as described 41 . 

ImrnunohistochemicaJ staining. The H&E-stained slides of all primary 
lung tumors were used to identify the most representative regions of each 
tumor and a tissue microarray (TMA) block was constructed as described 43 . 
Immunohistochemistry (IHC) was performed using both routine and sec- 
tions from the TMA block as described 24 . Detailed methods and the con- 
centrations used for all antibodies are provided in the Supplementary 
Methods. 

Statistical methods, t-tests were used to identify differences in mean gene- 
expression levels between comparison groups. Agglomerate hierarchical 
clustering 4 * was applied using the average linkage method to investigate 
whether there was evidence for natural groupings of tumor samples based 
on correlations between gene-expression profiles. To investigate the ro- 
bustness of the clustering inference, gene-expression values were per- 
turbed by adding random Gaussian error of magnitude obtained from a 
duplicate sample to each data point and then redustered to determine con- 
cordance in the tumor's dass membership. Pearson, x 7 and fisher's exact 
tests were used to assess whether duster membership was assodated with 
physical and genetic characteristics of the tumors. 

To determine whether gene-expression profiles were associated with 
variability in survival times, 2 separate but complementary approaches 
were used. In the first approach, the 86 tumors were randomly assigned to 
equivalent training and testing sets consisting of equal numbers of stage I 
and III tumors in order to validate a novel risk-index function that captured 
the effect of many genes at once. In the second approach, cross-validation 44 
was used to more robustly identify the genes associated with survival. 
Briefly, a 'leave-one-out' cross-validation procedure In which 85 of the 86 
tumors (the training set) was used to identify genes that were univariately 
associated with survival. The risk Index wayJefined as a linear combination 
of the gene-expression values for the top genes identified by univariate Cox 
proportional-hazard regression modeling 43 , weighted by their estimated re- 
gression coefficients. Kaplan-Meier survival plots and fog-rank tests were 
then used to assess whether the risk-index assignment to high/low cate- 
gories was validated in the test set. A more detailed description Is provided 
(Supplementary Methods online). 



Note: Supplementary information is available on tht Nature Medicine website. 
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